我们为神经机翻译(NMT)提供了一个开源工具包。新工具包主要基于拱形变压器(Vaswani等,2017)以及下面详述的许多其他改进,以便创建一个独立的,易于使用,一致和全面的各个领域的机器翻译任务框架。它是为了支持双语和多语言翻译任务的工具,从构建各个语料库的模型开始推断新的预测或将模型打包给提供功能的JIT格式。
translated by 谷歌翻译
In this work, we propose a new approach that combines data from multiple sensors for reliable obstacle avoidance. The sensors include two depth cameras and a LiDAR arranged so that they can capture the whole 3D area in front of the robot and a 2D slide around it. To fuse the data from these sensors, we first use an external camera as a reference to combine data from two depth cameras. A projection technique is then introduced to convert the 3D point cloud data of the cameras to its 2D correspondence. An obstacle avoidance algorithm is then developed based on the dynamic window approach. A number of experiments have been conducted to evaluate our proposed approach. The results show that the robot can effectively avoid static and dynamic obstacles of different shapes and sizes in different environments.
translated by 谷歌翻译
We introduce an approach for the answer-aware question generation problem. Instead of only relying on the capability of strong pre-trained language models, we observe that the information of answers and questions can be found in some relevant sentences in the context. Based on that, we design a model which includes two modules: a selector and a generator. The selector forces the model to more focus on relevant sentences regarding an answer to provide implicit local information. The generator generates questions by implicitly combining local information from the selector and global information from the whole context encoded by the encoder. The model is trained jointly to take advantage of latent interactions between the two modules. Experimental results on two benchmark datasets show that our model is better than strong pre-trained models for the question generation task. The code is also available (shorturl.at/lV567).
translated by 谷歌翻译
Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. In practice, this restricts the capability to fully leverage unlabeled data from numerous sources, which may include both 2D and 3D data. Additionally, the use of these pre-trained networks is constrained to downstream tasks with compatible data dimensions. In this paper, we propose a novel framework for unsupervised joint learning on 2D and 3D data modalities. Given a set of 2D images or 2D slices extracted from 3D volumes, we construct an SSL task based on a 2D contrastive clustering problem for distinct classes. The 3D volumes are exploited by computing vectored embedding at each slice and then assembling a holistic feature through deformable self-attention mechanisms in Transformer, allowing incorporating long-range dependencies between slices inside 3D volumes. These holistic features are further utilized to define a novel 3D clustering agreement-based SSL task and masking embedding prediction inspired by pre-trained language models. Experiments on downstream tasks, such as 3D brain segmentation, lung nodule detection, 3D heart structures segmentation, and abnormal chest X-ray detection, demonstrate the effectiveness of our joint 2D and 3D SSL approach. We improve plain 2D Deep-ClusterV2 and SwAV by a significant margin and also surpass various modern 2D and 3D SSL approaches.
translated by 谷歌翻译
Robots have been brought to work close to humans in many scenarios. For coexistence and collaboration, robots should be safe and pleasant for humans to interact with. To this end, the robots could be both physically soft with multimodal sensing/perception, so that the robots could have better awareness of the surrounding environment, as well as to respond properly to humans' action/intention. This paper introduces a novel soft robotic link, named ProTac, that possesses multiple sensing modes: tactile and proximity sensing, based on computer vision and a functional material. These modalities come from a layered structure of a soft transparent silicon skin, a polymer dispersed liquid crystal (PDLC) film, and reflective markers. Here, the PDLC film can switch actively between the opaque and the transparent state, from which the tactile sensing and proximity sensing can be obtained by using cameras solely built inside the ProTac link. In this paper, inference algorithms for tactile proximity perception are introduced. Evaluation results of two sensing modalities demonstrated that, with a simple activation strategy, ProTac link could effectively perceive useful information from both approaching and in-contact obstacles. The proposed sensing device is expected to bring in ultimate solutions for design of robots with softness, whole-body and multimodal sensing, and safety control strategies.
translated by 谷歌翻译
语义分割是开发医学图像诊断系统的重要任务。但是,构建注释的医疗数据集很昂贵。因此,在这种情况下,半监督方法很重要。在半监督学习中,标签的质量在模型性能中起着至关重要的作用。在这项工作中,我们提出了一种新的伪标签策略,可提高用于培训学生网络的伪标签的质量。我们遵循多阶段的半监督训练方法,该方法在标记的数据集上训练教师模型,然后使用训练有素的老师将伪标签渲染用于学生培训。通过这样做,伪标签将被更新,并且随着培训的进度更加精确。上一个和我们的方法之间的关键区别在于,我们在学生培训过程中更新教师模型。因此,在学生培训过程中,提高了伪标签的质量。我们还提出了一种简单但有效的策略,以使用动量模型来提高伪标签的质量 - 训练过程中原始模型的慢复制版本。通过应用动量模型与学生培训期间的重新渲染伪标签相结合,我们在五个数据集中平均达到了84.1%的骰子分数(即Kvarsir,CVC-ClinicdB,Etis-laribpolypdb,cvc-colondb,cvc-colondb,cvc-colondb和cvc-300)和CVC-300)只有20%的数据集用作标记数据。我们的结果超过了3%的共同实践,甚至在某些数据集中取得了完全监督的结果。我们的源代码和预培训模型可在https://github.com/sun-asterisk-research/online学习SSL上找到
translated by 谷歌翻译
在现实世界应用中,联合学习(FL)遇到了两个挑战:(1)可伸缩性,尤其是应用于大型物联网网络时; (2)如何使用异质数据对环境进行健全。意识到第一个问题,我们旨在设计一个名为Full-Stack FL(F2L)的新型FL框架。更具体地说,F2L使用层次结构架构,使扩展FL网络可以访问而无需重建整个网络系统。此外,利用层次网络设计的优势,我们在全球服务器上提出了一种新的标签驱动知识蒸馏(LKD)技术来解决第二个问题。与当前的知识蒸馏技术相反,LKD能够训练学生模型,该模型由所有教师模型的良好知识组成。因此,我们提出的算法可以有效地提取区域数据分布(即区域汇总模型)的知识,以减少客户在使用非独立分布数据的FL系统下操作时客户模型之间的差异。广泛的实验结果表明:(i)我们的F2L方法可以显着提高所有全球蒸馏的总体FL效率,并且(ii)F2L随着全球蒸馏阶段的发生而迅速达到收敛性,而不是在每个通信周期中提高。
translated by 谷歌翻译
腿部机器人在驾驶不平坦的地形方面表现出显着的优势。但是,实现四足机器人的有效运动和操纵任务仍然具有挑战性。另外,在这些问题中,机器人通常未知对象和地形参数。因此,本文提出了一个层次自适应控制框架,该框架使腿部机器人能够执行机车操作任务,而无需对物体的质量,摩擦系数或地形的斜率进行任何给定的假设。在我们的方法中,我们首先提出一种自适应操纵控制,以调节接触力,以操纵未知地形上的未知物体。然后,我们引入了一个统一的模型预测控制(MPC),以考虑机器人的操作,该控制力考虑了机器人动力学中的操纵力。因此,提出的MPC框架可以有效地调节机器人与物体之间的相互作用力,同时保持机器人平衡。我们提出的方法的实验验证成功地在Unitree A1机器人上进行了,使其可以操纵未知的时间变化负载,最高$ 7 $ $ kg $($ 60 \%的机器人重量)。此外,我们的框架可以快速适应未知斜率(最高$ 20^\ circ $)或具有不同摩擦系数的不同表面。
translated by 谷歌翻译
本文介绍了一个新颖的自适应频率MPC框架,用于在地形上具有不均匀的垫脚石上的两足球运动。详细说明,我们打算使用此MPC实现双足体周期步态的自适应脚部和步态,以便在不慢下放慢速度的情况下以不连续性穿越地形。我们将这种自适应频率MPC与Kino-Dynamics轨迹优化,以实现最佳步态时期,质量中心(COM)轨迹和脚部位置。我们使用全身控制(WBC)以及自适应频率MPC来跟踪离线优化的最佳轨迹。在数值验证中,我们具有优化的自适应频率MPC框架已显示出比固定频率MPC的优势。所提出的框架可以控制两足动物的机器人,穿过具有扰动的石头高度,宽度和表面形状的不均匀的垫脚石地形,同时保持平均速度为1.5 m/s。
translated by 谷歌翻译
本文提出了一种通过多接触模型预测控制(MPC)框架来控制类人形机器人动态机器人操作的新方法。在此框架中,我们提出了一个多接触动力学模型,该模型可以代表机车操作中的不同接触模式(例如,与地面与对象和脚接触的手接触)。提出的动力学模型简化了对象动力学作为应用于系统(外部力模型)的外力,以确保MPC问题的简单性和可行性。在数值验证中,我们的多接触MPC框架只需要每个任务的接触时间,并且所需状态才能使MPC了解Loco-Manipulation中预测范围中接触模式的变化。所提出的框架可以控制类人机器人的机器人,以完成多任务的动态机车操作应用,例如在转弯和行走时有效拾取和掉落物体。
translated by 谷歌翻译